Embedded-Text Detection and Its Application to Anti-Spam Filtering

نویسندگان

  • Ching-Tung Wu
  • Yuan-Fang Wang
  • Matthew Turk
  • Kwang-Ting Cheng
چکیده

Embedded-Text Detection and Its Application to Anti-Spam Filtering Ching-Tung Wu Embedded-text in images usually carry important messages about the content. In the past, several algorithms have been proposed to detect text boxes in video frames. Previous work often followed a multi-step framework using a combination of image-analysis and machine-learning techniques. In this work, we propose a unified embedded-text detection framework to efficiently and accurately locate text boxes particularly in web and email images. We approach the embeddedtext problem from the angle of object detection. We define position-independent features to capture the essence of characters and a smart-scan algorithm to trace text lines using their spatial and geometrical properties. We also propose a novel anti-spam system which utilizes visual clues, including the embedded-text information. The experimental results demonstrate the effectiveness of the proposed embedded-text detection framework and the anti-spam filtering system. Professor Kwang-Ting Cheng Thesis Committee Chair iv

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analys...

متن کامل

A Sobel Edge Detection Algorithm Based System for Analyzing and Classifying Image Based Spam

Early spam mails were only text-based, however spammers have moved to more sophisticated spamming techniques that involve images now generally termed image based spam. In most image-based spam, the entire spam message, which could be sometimes text, is embedded in an image of any format. This type of spam emails creates another dimension to the spam filtering problem scenario. Extracting text f...

متن کامل

Image Spam Filtering by Content Obscuring Detection

We address the problem of filtering image spam, a rapidly spreading kind of spam in which the text message is embedded into attached images to defeat spam filtering techniques based on the analysis of e-mail’s body text. We propose an approach based on low-level image processing techniques to detect one of the main characterstics of most image spam, namely the use of content obscuring technique...

متن کامل

Filtering Image Spam with Near-Duplicate Detection

A new trend in email spam is the emergence of image spam. Although current anti-spam technologies are quite successful in filtering text-based spam emails, the new image spams are substantially more difficult to detect, as they employ a variety of image creation and randomization algorithms. Spam image creation algorithms are designed to defeat well-known vision algorithms such as optical chara...

متن کامل

Fusion of Text and Image Features: A New Approach to Image Spam Filtering

While enjoying the convenience of email communications, many users have also experienced annoying email spam. Even if the current spam detecting approaches have gained a competitive edge against text-based email spam, they still face the challenge arising from imagebased spam (image spam in short). Image spam normally includes embedded images that contain the spam messages in binary format rath...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005